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The estimation of model parameters is an important subject in engineering. In this area of work, the prevailing 
approach is to estimate or calculate these as deterministic parameters. In this study, we consider the model 
parameters from the perspective of random variables and describe the general form of the parameter distribution 
inference problem. Under this framework, we propose an ensemble Bayesian method by introducing Bayesian 
inference and the Markov chain Monte Carlo (MCMC) method. Experiments on a finite cylindrical reactor and 
a 2D IAEA benchmark problem show that the proposed method converges quickly and can estimate parameters 
effectively, even for several correlated parameters simultaneously. Our experiments include cases of engineering 
software calls, demonstrating that the method can be applied to engineering, such as nuclear reactor engineering. 
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I. INTRODUCTION 


With the depth of its intellectual development, mathemati- 
cal modeling and simulation (M&S) have emerged as a pow- 
erful tool that promises to revolutionize engineering and sci- 
ence. M&S are expected to describe a system, on the basis 
of which further research can be conducted. For example, 
the Consortium for the Advanced Simulation of Light Water 
Reactors (CASL), was established by the US Department of 
Energy (DOE) in 2010, with the goal of providing M&S ca- 
pabilities that support and accelerate the improvement of nu- 
clear energy’s economic competitiveness and ensure nuclear 
safety [1]. 

Numerical simulations can differ significantly from exper- 
imental observations, and minimizing this difference has al- 
ways been an important topic in engineering practice [2-4]. 
The difference between model and reality is particularly true 
in reactor physics, primarily for the following reasons: 


(1) The complex nature of nuclear phenomena. A nu- 
clear system may involve neutron transport, thermal 
hydraulics, and fuel performance, among others, which 
makes it difficult to accurately model the neutronic be- 
havior. Theoretically, this can be well described by 
the Boltzmann transport equation, incorporating coef- 
ficients derived from various experimentally evaluated 
neutron interaction cross sections. These cross sec- 
tions have strong, discontinuous behavior in space due 
to material heterogeneities and extreme variations with 
energy due to resonance phenomena associated with 
compound nucleus formation [5—7]. 


(2) Inaccuracy of input parameters. A model requires in- 
put design data or explanatory data, such as geometry, 
composition, and control parameters, and input model 
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parameters, including physical constants such as inter- 
action cross sections and material thermal and mechan- 
ical properties. Particularly, the two-level approach 
(1.e., a micro-level model for determining macro cross 
sections and a macro-level model for determining the 
final response of interests) for reactor physics calcu- 
lation may lead inexact input parameters [5-7]. As a 
consequence, inexact parameters will clearly cause in- 
accurate outputs. 


With the advancement of science and computational power, 
we can probably make fewer simplifications and approxima- 
tions in our models, resulting to reduced differences, whereas 
the resources spent must still be considered for economic rea- 
sons. As we reduce this difference, it becomes increasingly 
dominated by the input parameters. 

In particular, nuclear data, such as neutron cross sections 
and resonance parameters, are evaluated by combining ap- 
proximated nuclear physics and experimental observations, 
which have epistemic and aleatoric uncertainties. These un- 
certainties then propagate through a neutronics model, creat- 
ing differences and uncertainties [8]. It is easy to imagine that 
it would be the best option if we were able to re-evaluate the 
nuclear data. Unfortunately, the nuclear evaluation process is 
significantly long and costly, making it unrealistic. For ex- 
ample, [9] estimated the cost of a single new observation of a 
nuclear datum at 400,000 US dollars. 

The uncertainty of the input parameters causes uncertain 
behavior in the model, resulting to the development of three 
fields in statistics: uncertainty quantification (UQ), sensitiv- 
ity analysis (SA), and data assimilation (DA) [10]. They are 
beneficial in engineering design, with the potential to signif- 
icantly reduce unnecessary costs. For example, knowledge 
of the sensitivity of attributes has been used to reduce costly 
design margins. 

Data assimilation assimilates previous experience or exper- 
imental observations to reduce the differences and uncertain- 
ties in model predictions. It was first proposed in the field 
of meteorology and is widely used to predict meteorologi- 
cal phenomena, such as hurricanes [11]. Data assimilation 
methods can be roughly divided into two categories: varia- 
tional data assimilation [12—14] and sequential data assimi- 
lation [15-17], whereas some hybrid data assimilation tech- 


niques have emerged [18-20]. Bayesian approaches to data 
assimilation were established after [21, 22] and [23] provided 
an overview of the Bayesian perspective, discussing some ap- 
proaches from this perspective. In neutronics, data assimila- 
tion is primarily used to predict the bias of critical systems 
[24], thereby reducing the uncertainty of advanced reactor 
designs [25]. In recent years, research interest in data as- 
similation in neutrons has mainly focused on the adjustment 
of nuclear data, which could establish cross-correlations be- 
tween nuclear data or nuclides that are traditionally unavail- 
able [26, 27]. In addition, new methods based on stochas- 
tic sampling of input parameters have been proposed, such 
as MOCABA [28] and BMC [29]. In addition, [30] showed 
that these two methods yield similar results to each other, as 
well as the traditional method known as generalized linear 
least squares (GLLS). Readers can refer to [31-33] for the 
Bayesian applications in thermal hydraulics. 

The method proposed in this study involves sampling the 
probability distribution, and the method we used is MCMC. 
The MCMC method began with the Metropolis method pro- 
posed by Metropolis et al. in 1953 and was extended byW.K. 
Hastings in 1970. In 1984, [34] demonstrated how the 
method, known as the Gibbs sampler, can be adapted to high- 
dimensional problems that arise in Bayesian statistics. In 
1987, [35] combined MCMC and molecular dynamics meth- 
ods, known as the hybrid Monte Carlo method or Hamiltonian 
Monte Carlo (HMC). Later, scholars developed a series of im- 
proved HMC, such as the No-U-Turn Sampler (NUTS) [36] 
and stochastic gradient Langevin dynamics (SGLD) [37], 
based on HMC. 

Although the Bayesian approach has been widely used in 
data assimilation, there is still a lack of research on the se- 
lection of a prior distribution, which is generally selected as 
the probability representation that exists before experimen- 
tal observations, and is probably unsatisfactory. It is evident 
that the prior distribution affects the results of the method. 
This study aims to complete this task, namely, to provide the 
initial input of prior distributions for classical data assimila- 
tion methods. As an example, recent work [38] has applied 
the Bayesian neural network method to predict the beta de- 
cay lifetime of atomic nuclei and their uncertainty, which im- 
proves learning accuracy and uses prior distributions of pa- 
rameters; therefore, it may be useful to consider using the 
method proposed in this work once before the algorithm to 
improve the performance of the method with more accurate 
prior distributions. Moreover, most data assimilation meth- 
ods only provide point estimations for parameters; however, 
what if nuclear data are actually probabilistic? In fact, cer- 
tain parameters in our understanding resemble distributions 
or random variables, such as the number of neutrons released 
per fission and the change in the concentration of nuclear 
fuel over a short period due to fuel depletion. We attempt 
to fill this research gap and take the first step in this distribu- 
tion inference. Therefore, we propose an ensemble Bayesian 
method. It is also worth mentioning the following: 


(1) The proposed method is simulation-based. As strate- 
gies for the design and evaluation of complex nu- 
clear systems have shifted from heavy reliance on ex- 


pensive experimental validation to highly accurate nu- 
merical simulations, more attention has been paid to 
simulation-based approaches. This allows our method 
to play a role in guiding the overall system design as 
well as optimizing the setup of the validating bench- 
mark experiments. 


(2) The proposed method has good compatibility. Because 
there are numerous legacy codes used extensively in the 
design process, they are expected to persist as integral 
parts of nuclear design calculations. Our method can 
be incorporated in the legacy codes, making it tractable 
for nuclear systems simulation. 


The remainder of this paper is organized as follows. In 
Section 2, we describe the parameter distribution inference 
problem in a general manner and describe the corresponding 
experiment. In Section 3, we first introduce the Bayesian in- 
ference and Metropolis-Hastings algorithms, followed by the 
core algorithm. In Section 4, we analyze the convergence of 
the method and illustrate a series of numerical tests on the 
proposed method, including a finite cylindrical reactor and 
the 2D IAEA benchmark problem. Finally, we provide a brief 
conclusion in the last section. 


Il. PROBLEM DESCRIPTION 


When modeling a physical phenomenon, the first step is 
to introduce reliable theories, such as conservation laws and 
transport theory. In nuclear reactor physics, there are many 
classic models such as the finite cylindrical reactor and the 2D 
IAEA benchmark problem, which will also be presented in 
Section IV. However, it is worth emphasizing that the method 
proposed in this study is valid, regardless of the theories in- 
troduced. For clarity, we skip this step and consider the fol- 
lowing general system: 


eae = f(x, p) 
z(x, u) = y(x, u)(1 + €) 


where x € D C R? isa spacial vector, y € R is an observ- 
able physical quantity, z € R is an observation of y possibly 
with noise €, and pp = (u®,--- , u®™®)T are input parameters 
of the model. 

f represents the intrinsic functional relationship between 
these variables, which models the corresponding physical 
phenomenon. e ~ N (0, ø?) is the zero-mean Gaussian rela- 
tive noise that models the observation noise. 


, (1) 


A. Perspective of Random Variables 


In current research, most data assimilation methods only 
provide point estimations for the model parameters u. How- 
ever, certain parameters in our understanding resemble dis- 
tributions or random variables, such as the number of neu- 
trons released per fission and the change in the concentra- 
tion of nuclear fuel over a short period due to fuel deple- 
tion. Here, we deal with the latter case. Thus, y = (0) = 


(u® (0), --- , u™® (0))T is assumed to be an n-dimensional 
parameter vector dependent on another parameter 6. The 
general parameter 0 € [a,b] might be time t or burnup Bu. 
We have the following theorem [39]: Let 0 ~ U[a, 6] and 
u : R > R” bea Borel measurable map; then, (0) is an n- 
dimensional random vector. In the following discussion, we 
are concerned with the overall statistical regularity of the pa- 
rameter u within the interval [a,b] and will always treat pz as 
a random vector with an unknown real joint distribution. Our 
goal is to estimate this distribution using observational data. 


B. Experimental Setup 


To achieve our goal, a special type of experimental setup 
was designed to obtain the observational data required by the 
method proposed in the next section. Specifically, we ob- 
tained data in the following manner: 


i) Choose m number of spatial positions: x1, --- 
D and set S = {24,--- 
nodes. 


Tm E 
,Zm} to be the set of sensor 


ii) Repeatedly sample k number of 0: a < 0, < ++: < 
Ok < bin the uniform distribution U (a,b). As an ex- 
ample, if 0 means time t, 01, -+ - , 0; are the observation 
times. 


iii) At each 6;, m number of observations 21,;,--- 
are obtained through the sensors. 


Zm, j 


Because we used m sensors, we obtained the following m 
observation equations: 


zi(x, u) = y(x, u)(1 + c) i= 1, mM. (2) 


where e; is the zero-mean Gaussian relative noise of the 7- 
th sensor. It is natural to assume that they are independent 
and have a common distribution, i.e., e; ~ N(0,07),Vi = 
l,m. 

After the previous steps, we obtain m x k observations 
zi(zi, u(0;)) i = 1, ,m;j = 1,--- ,k. For simplicity, 
we use the notation 


Zij = zi(xi, u(0;)). (3) 


II. ENSEMBLE BAYESIAN METHOD 


The Bayesian approach has been widely used in data as- 
similation and is the method proposed in this study. Subse- 
quently, we provide an overview of the Bayesian inference. 


A. Bayesian Inference 


Let X and Y be the quantities of interest that cannot and 
can be directly observed, respectively,a of which the obser- 
vations make up our data. From the perspective of Bayesian 
inference, there exists a joint probability distribution of all 


Algorithm 1: Metropolis-Hastings algorithm 


Input: initial values uo. 
Output: 41, u2,- 


for u + 1 to œœ do 

Choose a new candidate state u’ according to T; that is, 
choose ju’ with probability density T, 

Calculate the acceptance probability 


= ps Ty Ly! peui . 
a= min < 1, —————-_ 2; 
Tp 1 Epa a! 
Choose a number w from uniform distribution U (0, 1); 
if w < a then 
| oa HWS 
else 
| Hu © bu-1. 
end 


u=? 


end 


unobserved and observable quantities denoted by p(x, y). 
Bayesian inference determines the conditional distribution of 
the unobserved quantities of interest given the observation 
data. This is formally accomplished by applying the Bayesian 
formula: 


ply|z)ple) 


l 4 
p(y) G 


p(x|y) = 


provided 0 < p(y) < +00 is a normalized constant. 

We obtain the formula for the posterior distribution, which 
we will sample later. Some commonly used sampling al- 
gorithms include adaptive rejection sampling. However, in 
many cases, there is no analytical solution for the posterior 
formula, which makes it difficult to use many sampling algo- 
rithms. To overcome this problem, we introduce the MCMC 
method, which is a technique for sampling probability distri- 
butions. Here, we consider the Metropolis-Hasting algorithm, 
which has unique advantages for posterior distribution sam- 


pling. 


B. Metropolis-Hastings Algorithm 


The Classical Metropolis-Hastings algorithm [40] was first 
proposed in the early 1950s and extended by W.K. Hastings 
in 1970. As the posterior distribution that we derive later is 
continuous, the algorithm for the continuous state space is 
given below. 

Let m be a probability density function that must be sam- 
pled and 7, be the value of m in s. Let T be a transition 
function for any irreducible Markov chain with the same state 
space as 7 and T’, + be the value of a conditional density func- 
tion in ¢ given condition s. Furthermore, we assumed that we 
knew how to sample from T. Chain T is used as a proposal 
chain, generating the elements of a sequence that the algo- 
rithm decides whether to accept. The steps of the Metropolis- 
Hastings algorithm are presented in Algorithm 1. 


Algorithm 2: Single-component Metropolis-Hastings 
algorithm 


Input: initial values po = (us eT) 
1 n 
Output: pı = (u, wh) 
1 n 
(ue Pn 


for u «+ 1 to co do 

u + Hui; 

for v + 1 to n do 

Choose a candidate state 6“? of yu? according to 
T®,; that is, choose 8 œ) with probability density 
T® (B®); w e pe. 

We calculate the acceptance probability 
a = min {1, uE 

Choose a number w from a uniform distribution 
U (0,1). 

if w < a then 
| uP e p™; 

else 


| pn? we we p. 


end 


end 


end 


For the Metropolis—Hastings proposal distribution, if we 
choose a symmetric distribution with the current state as the 
symmetric point, such as a uniform distribution on an inter- 
val of length two centered at the current state, or a Gaussian 
distribution with the expectation of the current state, we can 
simplify Eq. (1) to: 


a= min {1 Bu fs (5) 

Tuu- 
because Tj .,_. = Ty,_,,,- This is the main reason why 
this algorithm works well with a posterior distribution. Next, 
we use the Gaussian distribution with the expectation of the 
current state as the proposal distribution. 

When 7 is the joint probability density function of multi- 
variate random variables, the proposed chain T from which 
we know how to sample, is difficult to obtain. In this case, 
the single-component Metropolis-Hastings algorithm is used. 


We denote T: (v) as a one-dimensional proposal distribution 
given condition s, satisfying 


Ts [s 50-1) t0) 94D, s00)]T 


pe) (4) = 
me) Siw Te,fs@ oD 0) 6+ ,... g(r at 
(6) 


where See Ts [s(),.- sD t0) s+) acy dt) is the 


normalized constant that makes T® a probability density, 
and the single-component Metropolis-Hastings algorithm is 
shown in Algorithm 2. 

In reality, we can directly choose a one-dimensional distri- 
bution from which we know how to sample, such as uniform 
or Gaussian distributions, as a one-dimensional proposal dis- 
tribution T®), This avoids the difficulty of sampling from a 
multidimensional distribution. 


C. Ensemble Bayesian Algorithm 


In this section, we derive the core method combined with 
the experimental setup (observation data) in Section IIB, 
which includes three steps: Bayesian inference, Metropolis- 
Hastings sampling, and performance evaluation. 


1. Bayesian Inference 


Because 01,- -- , 0k are samples of 6 ~ Ufa, b], it is easy 
to verify that u(01), --- ,4(0;) are also samples of u(0). In 
view of this, our basic idea is to estimate the distribution 
of u by aggregating the individual estimates of u(0;),j = 
l, ke 

For each 6;, there are several observations 21,;,--+ , Zm,j- 
Denote 4; = u(0;) be the value to be estimated. From the 
perspective of Bayesian inference and the Bayesian formula, 
we have 
Zma l) (7) 


p(uj|zijs t Zm) X T(uj): plaj 


where 7(j1;) denotes the prior distribution. In an experi- 
ment, the prior distribution is usually given by assumptions 
or guesses or evaluated by traditional methods with uncer- 
tainties. 

As ei ~ N(0,07) and zij = y(vi,uj)\(1 + €i) = 
f (vi, My)(1 + &), we have 


zi jle ~ NF (ai, nj), (F (ei, Hz)0)”). (8) 
By the independent assumption of the observation noise, 


m 


5 Peg lity = |] plise), (9) 


i=l 


p(21,9; oe 


is the product of k Gaussian density functions. Finally, 


m 


»%m,j) < (uj) |] p(zi sles), a0 


i=1 


P(g lZ1,j5°°° 


where m(uj) is the prior distribution, and p(z;,;|1;) 
is the density function of Gaussian distribution 
N(f (i, uj), (f (i, Wy)o)?). 

From the previous process, we obtain posterior distribu- 
tions of all uj. These are joint probability density functions 
of multivariate random variables and are continuous; there- 
fore, they are suitable for the single-component Metropolis- 
Hastings algorithm presented in Algorithm 2. 


2. Metropolis-Hastings Sampling 


As mentioned earlier, we intend to use a single- 
component Metropolis-Hastings algorithm to construct a 
Markov chain Xo, X1,:-- whose stationary distribution is 
P(H45|21,35°** »%m,j). To describe the transition mechanism 
for Xo, X1, +++, assume that at time u the chain is at state 4u, 


ie., Xu = pu = (pP, 


1 
of Xu+ı are T ey 


; ie 9) and the first v components 
u: Then, the (v + 1)th compo- 


) 


; 1). , i 
nent of X,,+1; that is, uet is determined using a two-step 


procedure: 


i) Choose a new state 6(’+ according to the Gaus- 
sian proposal distribution N(ug t” ,a(®+1)2), That is, 
choose 6+!) with probability density 


g+) pOT 2 


1 ) 
ZOH) ; (11) 


—i 
mae) = Vino eH) © i 


It should be mentioned here that the selection of the 
standard deviation of the proposal distribution +) 
is considerable, and significantly large or small distri- 
bution will affect the results of the algorithm. We must 
carefully choose in combination with the dimension of 
parameter u+»), 


ii) Decide whether to accept 3(’+” or not. Let 


If (ult), BO+Y) > 1, then B+ is accepted as 
the (v + 1)-th component of Xy4i, ie, py? = 
per). Te afu t, BO+D) < 1, then BC+) is ac- 


cepted with probability a(per? , Ber). If BOTY is 


(v+1) 


not accepted, then pù is kept as the v + 1-th com- 


ponent of X,,41, 1e., ae = mer, In other words, 


let U be uniformly distributed on (0, 1), then 


w+) _ a < a(p@t), Be) 


re (13) 


wet) else 

Although the Markov chain Xo, X1,--- has a stationary 
distribution 7, not all outputs u; in Algorithm 2 are samples 
of p(45|21,3,°°* ; 2m,7). According to the MCMC theory, for 
any sample function of the Markov chain Xo, Xj,---, only 
when the number of iterations is sufficiently large, the pro- 
cess is approximately random sampling from the distribution 
P(Hy|21,5)° ++ s 2m,z)- 

Let Neon be the number of convergence steps, Ntot be the 
total number of iteration steps. Denote N = Nig: — Neon- 
For any j = 1,---,k, we collect N number of samples 
fj,Neont11*** > Êj, Nior frOM p(uj|Z1,j'** , Zm,j) and aggre- 
gate all of them to obtain a frequency distribution as estima- 
tion of the real distribution of ju. 

MCMC is known to suffer from the problem of dependence 
between samples because the generated samples are imple- 
mented through a Markov chain. However, our technical path 
is to examine these samples as a whole and then achieve an 
overall estimation of the parameters; thus, the impact of these 
correlations on the method is minimal. 


3. Performance Evaluation 


To quantify the quality of the estimation, we introduced 
the Kullback-Leibler divergence [41], which has been widely 


used in information theory. Let p(x) and g(x) be two prob- 
ability density functions over R”. The Kullback-Leibler di- 
vergence (KL divergence) between the distribution p(x) and 
q(x) is defined as 


KL(p\|q) = - fro) . tog dx. (14) 


K L(p||q) > 0 with equality if and only if p(x) = q(x). 
Kullback-Leibler divergence quantifies how close a probabil- 
ity distribution is to another one; that is, a larger KL diver- 
gence indicates a larger difference between the two distribu- 
tions. 

Because the Gaussian distribution is the most common type 
of distribution and is uniquely determined by its expectation 
and variance, we propose estimating the parameter using the 
Gaussian distribution with the expectation and variance being 
the mean and variance of the frequency distribution, respec- 
tively. We then evaluated the performance of the estimation 
by comparing the KL divergence of the prior distribution with 
the real distribution and that of the Gaussian distribution used 
for the estimation with the real distribution. 

Suppose p(x) and q(x) are the density functions of 
N (u1, 01) and N (12, 02), respectively. 


2 2 
02 oi + (H1 — H2 

KL(p\|q) = log = + + £ 3 

O1 05 


(15) 


IV. NUMERICAL EXAMPLES 


In this section, we examine the effectiveness of the pro- 
posed method. Two models in nuclear reactor physics, 
namely the finite cylindrical reactor and the 2D IAEA bench- 
mark problem, are tested below. The first test case, adapted 
from [42], has an analytical solution; the 2D IAEA bench- 
mark problem is one of a classical benchmark problem in 
nuclear physics [42], and has only a numerical solution. To 
verify the effectiveness of the distribution estimation method, 
some parameters in these models were assumed to be uncer- 
tain in the following tests: 


A. The Finite Cylindrical Reactor 


A finite cylindrical reactor is a multiplying system of a uni- 
form reactor in the shape of a cylinder of physical radius R 
and height H. Its behavior is modeled using the following 
two-dimensional monoenergetic diffusion equation: 


1 
DAQ(r, z) = Lad(r, z) = Eo z) 
eff ; 
(Re, z) =0, ¢(r, He) =0,r€ (0, Re); ZE (0, He) 
(16) 
where Re = R + d, He = H +d, and d is the extrapolated 
length. 


1 v v+ v+2 n 
(v+1) (v+1)) _ p(w). „u€, BS ) us te any Merge smi) 
Au TE] 0) O) O+) 0+3) m) 
PHa s Matis Hu > Hù E fw irp ,Zm,j) 
ub v n n 1 vt1 n 
mud BE +D) eee ul D Feau ond ee i) (12) 
1 v+1 n 1 + n 
(ued, js sph Moves pe) cafasul HD ee ph) 
lym [its Fleet BOT mi? beep festa a TY a PY 
é RERA Fe wD BOT. ugo) (Fp weg we TD, ule)? 


By replacing the Laplacian with its cylindrical form, as 
shown in Figure 1, the analytical solution of (16) is as fol- 
lows when kes ¢ = 1: 


(17) 


a N 
+o) AAs N 
1 \ 
H 4 $ ih (r) H 2D Laplacian: 
1 1 =e a = 
a ae ] ve |! 3 10 8? 
ee ee S l gta Se 
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Fig. 1. Geometry of the finite cylindrical reactor [43]. 


Our goal is to estimate the distribution of certain uncer- 
tain parameters from the observations. Later, we compare 
two alternative schemes and examine the convergence of the 
method by applying them to single-parameter estimates, fol- 
lowed by multiparameter estimates. We set A = 20, He = 18 
and use the ensemble Bayesian method proposed in the pre- 
vious section to estimate the distribution of the parameter Re. 
In other words, we consider the following system: 


y(x, u) = 20- ( 


2(x, u) = y(x, 1)(1 + €) 


(18) 


where x € [0.5, 7.5] C R is one-dimensional spatial position, 
H is the input parameter we are interested in and is assumed 
to have a real distribution, Jo is the zeroth-order first-order 
Bessel function. y € R is an observable physical quantity, 
and z € R is the observation of y with noise e. In this study, 
observations were prepared using synthetic data from simula- 
tions rather than real observations. 

To prepare the observation data for our experiments, we 
first select m positions at equal intervals in [0.5,7.5] and re- 
peatedly obtain the observations for the corresponding po- 
sitions under k pairs of u sampled from the real distribu- 
tion. Suppose e ~ N(0, V0.2), the real distribution of u 


is N(11.0, 0.36) and the prior distribution is N (12.0, 0.25), 
which could be an empirical estimate. In this case, the num- 
ber of sensors m, number of samples k, number of conver- 
gence steps, and total number of steps of the Metropolis- 
Hastings algorithms Ncon and Ntot, which are introduced 
in Section IIB, affect the results. Subsequently, we fix the 
value of N = Niot — Neon = 100 and discuss the effect of 
m,k, Neon on the result. Before doing so, we compare two 
alternative schemes after sampling from the posterior distri- 
bution P(E5 121,55 tee mj): 


1. Comparison of Two Alternatives Schemes 


In this subsection, we compare these two schemes. Be- 
cause 41,''* , Hk are k samples of u, a straightforward idea 
is to restore 41,..., 4%. After introducing the Metropolis- 
Hastings algorithm, we can sample from the posterior distri- 
bution p(uj|z1,j,*** ,Zm,j) and traditional practice is to es- 
timate u; using samples from p(j1;|21,;,°*- » 2m,j). Another 
scheme is to aggregate all samples from the posterior distribu- 
tions p(f15|21,;,°°* Zm, j) J = 1l, +++ , k, as proposed earlier, 
rather than using their means. 

Figure 2 (a) and (b) present the results of the first and sec- 
ond schemes, respectively, with m = 200, k = 200, Neon = 
100. We can see that although the frequency distributions 
obtained by these two schemes have almost the same expec- 
tation, their variances are quite different; that is, the second 
scheme will obtain a variance much closer to the real case. 
We believe this was caused by the loss of observation infor- 
mation during the averaging step. 

More importantly, the estimation of the second scheme is 
intuitively much better than that of the first. This is because 
more samples are obtained in the second scheme, resulting in 
a more robust estimation. 


2. Convergence Analysis 


In this subsection, we discuss the effect of the parameters 
of the method on the result (i.e., the number of sensors m, 
number of samples k, and number of convergence steps of the 
Metropolis-Hastings algorithm Neon) and analyze the conver- 
gence of the method by setting N = Not — Neon = 100. 

Figure 3 shows the effect of Neon on the performance of 
the proposed method under different settings of (m, k). This 
shows that the Metropolis-Hastings sampling algorithm used 
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Fig. 2. Finite cylindrical reactor problem: comparison of schemes 1 and 2, in which scheme 1 aggregates k number of means whereas scheme 
2 aggregates k x N number of samples. Not only is the result of scheme 2 intuitively better, but scheme 2 also obtains expectation and 


variance, particularly the variance, closer to the real values. 
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Fig. 3. Finite cylindrical reactor problem: the effect of increasing Neon on the results under three sets of parameters (m, k), i.e., 
(100, 100), (150, 150), (200, 200) respectively. The curves in all three cases show that the effect of Neon on the results is minimal. 


in the ensemble Bayesian method converges rapidly. When 
Neon 1s somewhat large, such as Neon = 100, the increase 
hardly affects the performance of the method. 


Figure 4 shows the effect of k on the performance of the 
method under different settings of m when Neon, = 100 is 
fixed. This shows that when k is small, there is a fluctuation 
in performance, and as k increases, the performance tends 
to stabilize; that is, the expectation and variance tend to be 
closer to the real values, which are much closer than the prior 
values. 


Figure 5 shows the effect of m on the performance of the 
method when Neon = 100 and k = 200 are fixed. This 
shows that as m increases, the performance of the method 
improves; that is, the expectation and variance converge to the 
neighborhoods of the real values, and this expectation trend 


is particularly pronounced. 


After applying the ensemble Bayesian method by setting 
m = 200,k = 200, Neon = 100, we obtained the frequency 
distribution shown in Figure 2 (b). Intuitively, this frequency 
distribution provides an effective estimation of the real distri- 
bution. Equation (19) describes the KL divergence calcula- 
tions. From the perspective of the KL divergence, we again 
conclude that the distribution for the estimation derived us- 
ing the ensemble Bayesian method is much closer to the real 
distribution than the prior one. 


K L(Pprior||Preal) = 1.418, KL (pest||Preat) = 0.180. 
(19) 
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Fig. 4. Finite cylindrical reactor problem: when Neon = 100 is fixed, the effect of increasing number of samples k on the results under three 
sets of parameters m, i.e., 100, 150, 200 respectively. The curves in all three cases show that the performance of the method improves with 


the increase of m and quickly converges to the real situation. 
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Fig. 5. Finite cylindrical reactor problem: when Non = 100, k = 200 is fixed, the effect of increasing number of sensors m on the results. 
The curve clearly shows the convergence process of algorithm performance. Compared with Neon and k, m has the greatest influence on the 


method. 


3. Estimation of several parameters 


In this section, we test the effectiveness of the method for 
estimating several parameters simultaneously because several 
parameters are simultaneously uncertain in many cases. Now, 
we use the ensemble Bayesian method proposed in this study 
to simultaneously estimate the distributions of three parame- 
ters A, Re and He. Consider the following system: 


where x € [0.5,7.5] C R is a one-dimensional spatial posi- 
tion; u®, u®) , v3) are parameters that we are interested in 
and assumed to have a real joint distribution; Jo is the zeroth- 
order first-order Bessel function. y € R is an observable 
physical quantity, and z € R is the observation of y with 
noise € ~ N (0, V0.2). 


We experimented with the simulation datasets. To prepare 
the observational data for our experiments, we first selected 
200 positions at equal intervals in [0.5,7.5] and repeatedly ob- 
tained observations for the corresponding positions under 200 
pairs of (u® , 1), 1)) sampled from the real joint distribu- 
tion. 


Suppose the real joint distribution of (uu, uw, w®) 
be 3-dimensional Gaussian distribution with ex- 


pectation E(p, uw, u) and covariance matrix 


Cov(u™® , u® , u®)) as follows: 


0.25 0.1 0 
0.1 0.25 0 
0 0 1 


Cov(u® , p®, uO) = 


It should be noted that the correlations between them are as- 
sumed here. 

Prior distribution 7 (41, H2, u3) is also a 3-dimensional 
Gaussian distribution with expectation E,(/11, H2, H3) and 
covariance matrix Covp(41, 42, #3) as follows: 


12 
Epu, pP, u= | 19 |; 
22 
0.16 0 0 
Cov, (nu, np, p®)=| 0 016 0 
0 0 1.44 


After applying the ensemble Bayesian method by setting 
= 200,k = 200, Neon = 100, we obtained three fre- 
quency distributions, as shown in Figure 6. We find that the 
expectations and variances of all these frequency distributions 
are significantly close to the expectations and variances of the 
real marginal distributions. Intuitively, the frequency distribu- 
tions provide an effective estimation; therefore, the ensemble 
Bayesian method is also suitable for the simultaneous esti- 
mation of several parameters. From the perspective of the 
KL divergence, the estimation also shows a significant im- 
provement compared to the prior distribution. Equation (21) 
provides the calculations for the KL divergences. 


)\||Preai(H1)) = 2.431 
L(pest(H1)||Preat(u1)) = 0.300 
L Pprior (fb 2)||Preat (H2)) = 2.431 
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(21) 


L Pprior( H 3)||Preai(u3)) = 2.038 
L(pest(H3)||Preai(u3)) = 0.039 
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B. the 2D IAEA benchmark problem 


[42] gives the definition of the 2D IAEA benchmark prob- 
lem and [44] gives its implementation with neutronic code. 
Fig.7 illustrates the geometry of the reactor. Note that only 
one-quarter is given because the rest can be inferred by sym- 
metry along the x and y axes. We denote this quarter by 2, 
which is composed of four subregions with different physi- 
cal properties: Neumann boundary conditions for the left and 
bottom boundaries and the mixed boundary condition for the 
external border. 


The 2D IAEA benchmark problem as modeled by two- 
dimension two-group diffusion equations. Specifically, the 
flux ¢ = (61, 62)" (index 1 and 2 denote the high and ther- 
mal energy, respectively) satisfies the following eigenvalue 
problem[5]: Find (A, @) € C x L® (Q) x L® (Q), s.t. 


{ —V(DiV¢1) + (Sa + S1s2)¢1 = Axi (vd y1g1 + vU yp 2¢2) 
— V(D2V¢2) + Ya,262 — E1201 = Ax2(vU 4,161 + DU z2¢2) ’ 
(22) 

where X12 is the macroscopic scattering cross section 
from groups 1 to 2; Dj, Naz, © f, i, andy; are the diffusion co- 
efficient, macroscopic absorption cross section, macroscopic 
fission cross section, and fission spectrum of group 2, i € 
{1,2}, respectively; v is the average number of neutrons 
emitted per fission. In other words, these quantities are the 
model parameters. The general settings are listed in Table 1. 

Later, we assumed Də in Q4 was uncertain and used the 
proposed ensemble Bayesian method to estimate its distribu- 
tion, as it is probably one of the most important parameters. 
We assume that all other parameters are certain and set them 
according to Table 1 except for Dz in Q4. We denote this as 
Lt, which is what we are interested in. 

We experimented with simulation datasets. To pre- 
pare the observation data for our experiments, we first 
selected 45 positions (i.e., (10,10), (30, 10), (30, 30),---, 
(170, 10),--- , (170, 170)) in area Q and repeatedly obtained 
the observations for the corresponding positions under 200 
pairs of u sampled from the real distribution. 

Moreover, the two-dimensional two-group diffusion equa- 
tion eqrefeq29must be solved in the algorithm process. This 
was achieved by employing the generic high-quality finite el- 
ement solver FreeFem++ [45]. 

Suppose that the real distribution of j is a Gaussian dis- 
tribution with expectation E(u) and variance Var(p) as fol- 
lows: 


E(u) = 2.00, Var(s:) = 0.04, (23) 


where E(x) is the value listed in Table 1. 
The prior distribution 7(j/2) is also a Gaussian distribution 
with expectation E,,(j) and variance Var,(.) as follows: 


E, (1) = 2.20, Varp (u) = 0.05. (24) 


After applying the ensemble Bayesian method with setting 
Neon = 100, we obtained the frequency distribution shown 
in Figure 8. We find that the expectation of the frequency dis- 
tribution is significantly close to the real expectation, and the 
variance is also closer to the real variance than the prior dis- 
tribution. Furthermore, the frequency distribution intuitively 
provided an effective estimation. From the perspective of the 
KL divergence, the estimation also shows a significant im- 
provement compared to the prior estimation. Equation (25) 
describes the KL divergence calculations. In this case, we 
used the engineering software Freefem++ to implement our 
algorithm because this model only has numerical solutions; 
therefore, the proposed ensemble Bayesian method has po- 
tential for engineering applications. 


K L(Ppriorl |Preal) = 0.5134, K L(pest||Preal) = 0.0213. 
(25) 
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Fig. 6. Finite cylindrical reactor problem: Result of three marginal distributions when setting the number of sensors m = 200, number of 
samples k = 200, and number of convergence steps of Metropolis-Hastings algorithm Neon = 100. 


Table 1. General settings of the parameters of the IAEA 2D benchmark problem. 


Dı Dz X42 Ya 


Da2 Vusa Vf X1 X2 
1 1 1 


Region cm cm cem™! cm™! em™! cm! em Material 
Qı 1.50 0.40 0.02 0.01 0.080 0.00 0.135 1 0 Fuel | 
Qə 1.50 0.40 0.02 0.01 0.085 0.00 0.135 1 0 Fuel 2 
Qs 1.50 0.40 0.02 0.01 0.130 0.00 0.135 1 0 Fuel2+Rod 
Q4 2.00 0.30 0.04 0.00 0.010 0.00 0.000 0 0 Reflector 


Fig. 7. Geometry of 2D IBP, upper octant: region assignments, lower 
octant: fuel assembly identification [44]. 


For the finite cylindrical reactor, the computation time of 
one dimensional case (Figure 2 (b)) is 111.7s; the computa- 
tion time of three dimensional case (Figure 6) is 359.6s. For 
the 2D IAEA benchmark (Figure 8), the computation time is 
about 130000s. The main reason why 2D IAEA benchmark 
takes more time is that FreeFem++ takes up most of the run- 
time, and the most important problem is that FreeFem++ is 
not called or connected to the code efficiently enough. There- 
fore, if there is a way to call FreeFem++ efficiently, it can 
significantly save computing time. 
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Fig. 8. 2D IAEA benchmark problem: Result of the method when 
setting the number of samples k = 200 and the number of conver- 
gence steps of Metropolis-Hastings algorithm Neon = 100. 


V. CONCLUSION 


In this study, we considered model parameters from the 
perspective of random variables in the context of nuclear reac- 
tor engineering and proposed a general form of parameter dis- 
tribution inference problem. In the context of this parameter 
distribution estimation problem, we conducted a preliminary 
exploration and proposed an ensemble Bayesian method to 
estimate the parameters by obtaining frequency distributions, 
combined with a special type of experimental setup. Simulta- 
neously, we introduced the KL divergence theory to quantify 
the estimation performance of the method. Various numeri- 


cal experiments were conducted, including a finite cylindrical 
reactor model, which has an analytical solution, and the 2D 
IAEA benchmark problem, which only has a numerical solu- 
tion. From the results of these tests, the following conclusions 
were drawn: 


e For those two models, the frequency distributions intu- 
itively provide effective estimation for the parameters. 
The expectations are significantly close to the real ones, 
and the variances are also more accurate than the prior 
distributions. From the perspective of KL divergence, 
the method also has good performance; 


e The ensemble Bayesian method can estimate several 
parameters simultaenously, even if there are correla- 
tions between them; 


e When the model has only numerical solution, we use 
engineering software Freefem++ to implement our al- 
gorithm. The method also works well in this case, 
which means it has potential for engineering applica- 
tions; 


e The convergence speed of the method is fast. Thus, in 
general, the ensemble Bayesian method we propose has 
optimistic application prospects. 
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The proposed method shows high potential for engineering 
applications to correct prior distributions when using the data 
assimilation technique for parameter estimation. 

Further studies could investigate the performance of the 
proposed method under moderate-scale parameters, the per- 
formance for non-Gaussian parameters, and characteristic pa- 
rameters that are likely to be difficult to reproduce. As an im- 
portant review of machine learning (ML) in nuclear physics, 
[46] not only describes the different methodologies used in 
ML algorithms and techniques and some applications, partic- 
ularly for low- and intermediate-energy nuclear physics, but 
also provides a valuable summary and outlook on the possible 
application directions and improvements of ML algorithms 
in low- and intermediate-energy nuclear physics. Inspired by 
the review, we will improve the efficiency of the method, and 
our initial idea is to improve the efficiency of the Metropolis- 
Hastings sampling method used in this study. Furthermore, 
we will determine how to involve more physics-informed or 
physics-guided to develop new machine learning algorithms. 
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